Looking at the Web through XML Glasses

نویسندگان

  • Arnaud Sahuguet
  • Fabien Azavant
چکیده

The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human and make information accessible to applications, in order to offer automation, inter-operation and Web-awareness among services. To do so, information from Web sources needs to be accessible in a structured way. XML and its various extensions (data-models, query languages) are a step in this direction. Unfortunately, the Web is not yet a well organized repository of nicely structured XML documents but rather a conglomerate of volatile HTML pages, for which structure has to be extracted. To address this problem, we present the World Wide Web Wrapper Factory (W4F), a Java toolkit for the generation of wrappers for Web sources. Our main contributions are: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to XML documents, with the automatic generation of the corresponding DTDs; (3) some visual supports to make the engineering of wrappers faster and easier. As an illustration, we show how we can, via W4F intermediation, transparently query HTML sources from an XML query language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Web Glasses and Semantic Resolvers

In this paper, author proposes a solution to make URI Resolver class available in different programming languages. The Semantic Resolver derived from a corresponding .NET or Java classes resolves resources using semantics rather than physical mapping. When a Web Service consumer tries to parse information retrieved from a web service, Semantic Resolver based on XML Namespace and other indicator...

متن کامل

NNexus Glasses: a drop-in showcase for wikification

This paper describes a general drop-in approach for showcasing web services and web user interfaces, provided they are accessible through or realized via JavaScript and CSS. The author utilizes the Greasemonkey extension for Firefox to invade the client with user scripts, which load the desired new functionality, as if looking at the web through an enhanced pair of glasses. Such demos have so f...

متن کامل

Agents for the Grid: A comparison with Web Services (part II: Service Discovery)

In order to build an open, large-s ale and inter-operable multi-agent system in the ontext of Grid omputing, we are looking at integrating agents te hnologies with Web Servi es. In this paper, we address this on ern for SoFAR, the Southampton Framework for Agent Resear h. We fo us on all te hni al aspe ts of reating, deploying, and publishing agents as Web Servi es. Not only have we been able t...

متن کامل

XMLibrary Search: An XML Search Engine Oriented to Digital Libraries

The increase in the amount of data available in digital libraries calls for the development of search engines that allow the users to find quickly and effectively what they are looking for. The XML tagging makes possible the addition of structural information in digitized content. These metadata offer new opportunities to a wide variety of new services. This paper describes the requirements tha...

متن کامل

Through the Looking-Glass, and What Eve Found There

Looking-glasses are web applications commonly deployed by Autonomous Systems to offer restricted web access to their routing infrastructure, in order to ease remote debugging of connectivity issues. In our study, we looked at existing deployments and open-source code to assess the security of this critical software. As a result, we found several flaws and misconfigurations that can be exploited...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999